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Abstract 

The subject of this paper is the estimation of a probability measure on M. d from data 
observed with an additive noise, under the Wasserstein metric of order p (with p > 1). We 
assume that the distribution of the errors is known and belongs to a class of supersmooth 
distributions, and we give optimal rates of convergence for the Wasserstein metric of order 
p. In particular, we show how to use the existing lower bounds for the estimation of the 
cumulative distribution function in dimension one to find lower bounds for the Wasserstein 
deconvolution in any dimension. 
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1 Introduction 

We observe n random vectors Yi in M d sampled according to the convolution model: 

Yi = Xi + e, (1) 

where the random vectors Xi = (X^i, . . . ,Xij, . . . , X^d)' are i.i.d. and distributed according 
to an unknown probability measure \i. The random vectors e, = (ffj i, • • • , £jj, • • • , £% d)' are 
i.i.d. and distributed according to a known probability measure fi £ . The distribution of the 
observations Yi on Mr is then the convolution /U*/i E . Here, we shall assume that there exists an 
invertible matrix A such that the coordinates of the vector Ae\ are independent (that is: the 
image measure of fi E by A is the product of its marginals). 

This paper is about minimax optimal rates of convergence for estimating the measure \x 
under Wasserstein metrics. For p > 1, the Wasserstein distance W p between \i and // is defined 
by: 

Wp (//,//)= inf A \\ x - y\\ p n(dx,dy) ) , 

where LT(/i, //) is the set of probability measures on M. d x M. d with marginals fi and // and p 
is a real number in [l,oo( (see |RR98j or [Vil08] ) . The norm |.| is the euclidean norm in Mr 
corresponding to the inner product < •, • >. 

The Wasserstein deconvolution problem is interesting in itself since W p are natural distances 
for comparing measures (without assuming densities for instance). Moreover, it is also related to 
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recent results in geometric inference. Indeed, in 2011, [CCSMlT] have denned a distance function 
to measures to answer geometric inference problems in a probabilistic setting. According to their 
result, the topological properties of a shape can be recovered by using the distance to a known 
measure fl, if p, is close enough to a measure fj, concentrated on this shape with respect to 
the Wasserstein distance Wi- This fact motivates the study of the Wasserstein deconvolution 
problem, since in practice the data can be observed with noise. 

In the paper |CCDM1T] . the authors consider a slight modification of the classical kernel 
deconvolution estimator, and they provide some upper bounds for the rate of convergence of 
this estimator for the W2 distance, for several noise distributions. Nevertheless the question of 
optimal rates of convergence in the minimax sense was left open in this previous work. The 
main contribution of the present paper is to find optimal rates of convergence for a class of 
supersmooth distributions, for any dimension under any Wasserstein metric W p . In particular 
we prove that the deconvolution estimator of fi under the W% metric introduced in |CCDMiT] 
is minimax optimal for a class of supersmooth distributions. 

The rates of convergence for deconvolving a density have been deeply studied for other met- 
rics. Minimax rates in the univariate context can be found for instance in |Fan91bl IBT08al 
IBT08b] and in the recent monograph [Mei09] . The multivariate problem has also been investi- 
gated in |Tan94l ICLllj . All these contributions concern pointwise convergences or L 2 conver- 
gences; rates of convergence for the Wasserstein metrics have been studied only by [CCDMlT] . 
In Section [2] of the present paper, we shall see that, in the supersmooth case, lower bounds for 
the Wasserstein deconvolution problem in any dimension can be deduced from lower bounds for 
the deconvolution of the cumulative distribution function (c.d.f.) in dimension one. 

Another interesting related work is [GPPVW12| . In this recent paper, the authors find 
lower and upper bounds for the risk of estimating a manifold in Hausdorff distance under 
several noise assumptions. They consider in particular the additive noise model ([T} with a 
standard multivariate Gaussian noise. 

Before giving the main result of our paper, we need some notations. Let u be a measure on 
M d with density g and let m be another measure on M. d . In the following we shall denote by 
m-k g the density of m-ku, that is 

m-kg(x)= / g(x — z)m{dz) . 

JR d 

We also denote by fi* (respectively /*) the Fourier transform of the probability measure \x 
(respectively of the integrable function /), that is: 

fi*(x) = [ e i<:t ' x> fi(dt) and f*(x) = f e i<f ' x> f(t)dt . 

JR d JR d 

For M > and p > 1, let T>a{M,p) be the set of measures \x on M. d for which 
sup E M ((1 + \{AX{)j\ 2p+2 ) H (1 + (AKOf)) < M < 00. 

l -i- d !<£<d,£^j 

Let us give the main result of our paper when e\ is a non degenerate Gaussian random 
vector (by non degenerate, we mean that its covariance matrix is not equal to zero). 

Theorem 1. Assume that we observe Y\, . . . ,Y n in the multivariate convolution model (QP, 
where E\ is a non degenerate Gaussian random vector. Let A be an invertible matrix such that 
the coordinates of Ae\ are independent. Let M > and p > 1 . Then 



2 



1. There exists a constant C > such that for any estimator jl n of the measure ji: 

liminf (lognf/ 2 sup E ( )®n(T^(/i n , /x)) > C. 
nev A (M, P ) 

2. One can build an estimator fi n of fj, such that 

sup sup (log n) p/2 E (We) ®„(WJ(/} n ,/i)) < K, 

n>l neV A (M,p) 

for some positive constant K. 

Note that in Theorem Q] the random vector E\ may have all its coordinates, excepts one, 
equal to zero almost surely. In other words, a Gaussian noise in one direction leads to the same 
rate of convergence as an isotropic Gaussian noise. 

The paper is organized as follows. The proof of the lower bound is given in Section In 
Section [3] we then give the corresponding upper bound in the same context by generalizing the 
results of [GGDMlT] for all p > 1. We finally discuss the W p deconvolution problem for ordinary 
smooth case in Section HI Some additional technical results are given in Appendix. 

2 Lower bounds 
2.1 Main result 

The following theorem is the main result of this section. It gives a lower bound on the rates of 
convergence of measure estimators in the supersmooth case for any dimension and under any 
metric W p . 

Theorem 2. Let M > 0, p > 1. Assume that we observe Yi, . . . ,Y n in the multivariate 
convolution model fTp. Assume that there exists jo G {1, . . . ,d} such that the coordinate (Aei)j 
has a density g with respect to the Lebesgue measure satisfying for all icel: 

\g*(w)\(l + \w\)-P exp(\wf />n) < c x (2) 

for some f3 > and some /3 G M. Also assume that there exist some constants n\ 6 (0, 1) and 
k>2 > 1 such that 

P{\(Ae 1 ) jo -t\< \t\ Kl ) = 0(\t\~ K2 ) as \t\ oo (3) 

and 

max(p + 2,|L + £)<K 2 . (4) 

Then there exists a constant C > such that for all estimator fi n of the measure 
liminf (logn) p//3 sup E ( „*„ )®„ W^(fi n , //) > C. 

The assumption about the random variable (Aei)j means that the noise is supersmooth in 
at least one direction. Indeed, as shown in Section [2.1.11 the lower bound for the multivariate 
problem can be deduced from the lower bound for the L 1 estimation of the c.d.f. of (^4ei)j . 
If the distribution of the noise is supersmooth in several directions then one may choose the 
direction with the greatest coefficient (3. 

The assumption ([3]) is classical in the deconvolution setting, see for instance |Fan91bllFan92j . 
The technical assumption summarizes the conditions onp and k,2- The condition < ^2 

is also required in |Fan91b] and |Fan92j . The additional condition p + 2 < K2 is a consequence 
of the moment assumption on \i. 

If g has a polynomial decay rate at infinity, we can state the following lemma: 
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Lemma 1. Assume that the density g of {Ae\)j satisfies \g(t)\ = 0(\t\~ a ) as t tends to infinity, 
for some a > p + 3. Then one can find K\ £ (0, 1) and k<i > 1 such that Conditions and 
are satisfied. 

Proof. For t > large enough 

P(\(A £l ) jo -t\ <i K1 ) = / g{u)du 

Jt~t K i 

= 0((t- t Kl )- p ~ 2 - (t + t Kl )- p ~ 2 ) = O ^-(p+3- k i)) . 

We choose k\ = 3/5 and we take K2 = p + 3 — k\ = p + 12/5. Note that n\ E (0, 1), K2 > 1 and 
that (|3|) is satisfied. Moreover, K2 > p + 2 and K2 — K2/(2ki) — ^>l/3>0 and thus Condition 
dl]) is also satisfied. 

□ 



2.1.1 Wasserstein deconvolution and c.d.f. deconvolution 

It is well known that the Wasserstein distance W\ between two measures \i and \J on R can be 
computed using the cumulative distribution functions: Let \i and // be two probability measures 
on R, then 

WxipJ) = [ \F^{x)-F^{x)\dx. 

According to this property, lower bounds on the rates of convergence for estimating fx in the 
one dimensional convolution model ([T]) for the metric W\ can be directly deduced from lower 
bounds on the rates of convergence for the estimation of the c.d.f. of \i using the integrated risk 
7Z(F) := f K \Ffj,(t) — F{i)\dt. This last problem has been less studied than pointwise rates in 
the deconvolution context but some results can be found in the literature. For instance |Fan92j 
gives the optimal rate of convergence in the supersmooth case for an integrated (weighted) L p 
risk under similar smoothness conditions as for the pointwise case (studied in |Fan91b| ). The 
cubical method followed in |Fan92j to compute the integrated lower bound is also detailed in 
[Fan93j . It is based on a multiple hypothesis strategy, see |Tsy09| for other examples of using 
multiple hypothesis schema for computing lower bounds for integrated risks. 

For M > and p > 1, we consider the set Ca(M,p) of the measures [i in T>a{M,p) for which 
the coordinates of AX\ are independent. Thus, for \x £ Ca(M,p): 

sup (E At (l + |(AYi) i |*+ 2 ) J] E„(l + (AX 1 ) 2 e )^j < M < oo. 

Moreover we simply use the notation C(M,p) if A = Lj. 

The following theorem gives lower bounds for Wi(fL n ,fi) in the (i-dimensional case, which 
are derived from lower bounds on the rates of convergence of c.d.f. estimators in R. 

Theorem 3. Under the same assumptions as in Theorem^ there exists C > such that for 
all estimator Jx n of the measure fi: 

liminf (logn) 1//3 sup E/„*„ )®n Wi(fin, fi) > C. 
n ^°° »eC A (M, P ) 

Theorem [2] is a corollary of Theorem [3] because 

1. Ca(M,p) is a subset of T>a(M,p). 

2. For any p > 1, E ((U ^ e) ®„ Wf{fl n ,fi) > (E^^gn Wi(p, n , n)) p . 

3. W\ is the smallest among all the Wasserstein distances: for any p > 1 and any measures 
fi and y! on R d : W p (p,,fi') > W^/J,'). 
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2.2 Proof of Theorem H 



Since the works of Le Cam, it is well known that rates of convergence of estimators on some 
probably measure space V can be lower bounded by introducing some convenient finite subset of 
V whose elements are close enough for the total variation distance or for the Hellinger distance. 
In the deconvolution setting, x 2 distance are preferable to these last metrics. Here, the following 
definition of the x 2 distance will be sufficient: for two positive densities h\ and h 2 with respect 
to the Lebesgue measure on M. d , the x 2 distance between h\ and h 2 is defined by 

X {h 1 ,h 2 )= / 7— r~\ dx - 

The main arguments for proving Theorem [3] comes from |Fan91b[ IFan92l IFan93j . However 
some modifications are necessary to compute the lower bounds under the moment assumption 
Ca(M,p). Furthermore, we note that Theorem 1 in [Fan93j cannot be directly applied in this 
multivariate context. 

Without loss of generality, we take jo = 1. We shall first prove Theorem [3] in the case where 
£1 has independent coordinates. 

2.2.1 Errors with independent coordinates 

In this section, we observe Y±, . . . ,Y n in the multivariate convolution model (pQ) and we assume 
that the random variables (si,j)i<j<d ar e independent. This means that A = Li and that £\ has 
the distribution fj, £ = /i e> i <g> fi £t2 ® ■ • • <g> fi £ ^- 

Definition of a finite family in C(M, p). Let us introduce a finite class of probability measures 
in C(M,p) which are absolutely continuous with respect to the Lebesgue measure A^. First, we 
define some densities 

foAt)-=C r (l+t 2 )- r (5) 

with some r > such that 

/ 3 k 2 \ 1 
max I p+ - , - — J < r < k 2 - -. (6) 

Note that this is possible according to dU). Moreover, /o,r has a finite (2p + 2)-th moment. 
Next, let b n be the sequence 



b n := 



IYl. sW 
- log n 
V 



VI, (7) 

where [•] is the integer part, and rj = {l — 2 2 J-\ J ll- Note that b n is correctly defined in this 
way since K 2 — \>r. For any 9 £ {0, l} b ", let 

f 9 (t) = fo, r (t) + Aj2 d sH(b n (t-t Stn )), teR, (8) 

s=l 

where A is a positive constant and t StTl = (s — l)/b n . The function H is a bounded function 
whose integral on the line is 0. Moreover, we may choose a function H such that (see for instance 
|Fan91bj or |Fan93j ): 

(Al) Hdt = and J* \H^\ dt > 0, 
(A2) \H(t)\ < c{l + t 2 )~ r , 
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(A3) H*(z) = outside [1,2] 

where H^(t) := jl H{u) du is a primitive of H. 

Using (A2) and Lemma [3] of Appendix [Aj we choose A > small enough in such a way 
that fg is a density on R. Note that by replacing H by H/A in the following, we finally can 
take A = 1 in ([8]). Using (A2) and Lemma [3] again, we get that for M large enough, for all 
9 G {0,1} 6 ": 

/ (1 + t 2 V t 2p+2 ) / fl (t) dt < M x l d . (9) 

We finally use these univariate densities fg to define a finite family of probability measures 
on M. d which is included in C(M,p). For 9 G {0, l} 6 ™, let us define the probability measure on 

fig := {fg ■ dX) ® (/ , r • cZA) ® • • • ® (/ , r • dA) . (10) 
For any j G {1, . . . , d}, according to © : 

(E Me (l + x5 +2 ) n E w (l + ^))<M 

and thus //# G C(M,p). 

Lower bound. Let /i n be an estimator of /U and let (/i n )i be the marginal distribution of 
fi n on the first coordinate (conditionally to the sample Y\, . . . ,Y n ). According to Lemma [6] of 
Appendix [Bj 

sup E (w% )®„PFi (ju,/v) > sup (w, /l n ) 

u&C(M,p) 6»S{0,1}" 

> SUp E (/We )®nTUi (fg ■ d\ , (/Xn)l) 

0e{o,i}™ 



> inf sup E ( ^ } ®nIUi [fg-dX, f r 
fn ee{o,i}« v 

where the infimum of the last line is taken over all the probability measure estimators of fg ■ dX. 

Following [Fan93] (see also the proof of Theorem 2.14 in |Mei09| ). we now introduce a 
random vector 9 whose components 9 S are i.i.d. Bernoulli random variables 9i,...,9b n such 
that P(9 S = 1) = |. The density fg is thus a random density taking its values in the set of 
densities defined by ([8]). Let E be the expectation according to the law of 9. For any probability 
estimator f n : 



sup W x (//, fin) > E E ( Wi [f s -dX,f n 

MSC(M,p) ' V 



> / EE 



(\F~ e (t) - F n (t)\) dt (11) 



where F and are the c.d.f. of the distributions f n and /g • dX. For 6> G {0, l} bn and 
s G {1, . . . , b n }, let us define 

fe,s,o ■= f(e 1 ,...,e s - 1 ,o,e s+1 ,...,e bn ) and /e.s.i : = f(e u ...,e 3 - 1 ,i,e s+1 ,...,e bn ) 

and the corresponding probability measures no,s,o and on M. d defined by (fTUj) for /# = /e jS) o 
or fg jSj i. Let /ig,s,o an d be the densities of /itf.s.o*/^ and f^e^,!*^ for the Lebesgue measure 
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on W 1 . Since the margins of \iq and \x £ are independent, for any yi = (y^i, . . . , j/jj, . . . , y^d) £ ^ d 
(n = or 1), we have: 

he,s,u{Vi) = hg jS>u (yi,i) f , r * VeAvij) ( 12 ) 

j=2,...,d 

where he )SjU = fe, s ,u*9- Let F0 jS>o and ify^i be the c.d.f of f e , S) o and fe, s ,i- For i E [t s>n ,t s+ i, n ] 
where s in {1, . . . , b n }, by conditioning by S , we find that 



E, 



-E 



EE^p (Ji^(t)-F re (t)| 

1 

Hence 

EE ( )^(|F e -(t)-F n (t)|) >^E / ... / (t)-F n (*)| + |^ lS i(*)-^nWl} 

/ n n \ 

min ( ]J h § s Q (yi), h^iiVi) J d Vl ■ ■ ■ d Vn , 

and consequently, according to (fl~2]) . 
EE^) 8 " (l^W " &(<)l) > ^ E / • • • / l^,.,o(<) " 



i=l 



min | JJ h §s0 (y itl ), JJ h §sl (y itl ) J <j JJ / , r * fJ- £ ,j(yi,j) \ d Vl ■ ■ ■ dy n . 

\i=l i=l / [j=2 

By using Fubini, it follows that 



J E / ■ ■ / l^ s ,o(*) - min f ii^fc). n^ j8 ,l(y»,l) J • • - d Vn. 

Note that for any E {0, 1}S |i^ s0 (t) - = b" 1 \H^ (b n (t - t s>n ))\, thus 

EE (w)8n (|F e -(t)-F n (t)|) > 



i • 



|ff(-D (b w (f-t a|W ))|. 
26 n 



E/ min I n^.s.o^. 1 )'!!^,!^ 1 ) ) d V^--- d Vn,i- ( 13 ) 



\i=l i=l / 

According to Le Cam's Lemma (see Lemma [7] of Appendix [B]), for any E {0, l} 6n : 



„ / n n \ 

/ min TT^ )S ,o(yi,i),TT^6i, s ,i(yi,i) dj/i.i • • • dy^i 
* \t=i 7=1 J 



l 

> - 
~ 2 



Ifr0,*,o(yi,i) 



a,! 



1 

> - 
~ 2 



,i=i 



he,8,o(yi,i) h e ,s,i{yi,i) d y 



1 2n 



1.1 



> 



1 - -jX 2 (he, s ,o, he, s ,x) 



2n 



(14) 



where we have used Lemma [S] of Appendix [B] for the last inequality. Assume for the moment 
that there exists a constant c > such that for any 9 6 {0, l} bn : 

X 2 (he,s,o, hg s l ) < -. (15) 
n 

Then, using (|11|) . (|13|) . (|14|) and (|15|) . we find that there exists a constant C > such that 



sup E^.^ (//, /in) > 7- V / # M) (6„(t - *,,„)) dt>- 

ueC(M,p) ' °n „_., Jt s n On Jo 



du , 



Take b n as in (|7|) and the theorem is thus proved (for A = 1^) since the last term is positive 
according to (Al). 



Proof of (I15|) . Let C be a positive constant which may vary from line to line. We follow 
[Fan92] to show that (|15p is valid for b n chosen as in ([7]). Recall that we have chosen the 
function H such that, by Lemma[3]of Appendix [Al fg > Cfo r . Thus, 



+°° { H[b n (t-u- t Stn )] g(u) tin j 



2 



X (he, s ,o , h e s i) < / — J — dt 



2 



30 \ J-™ H i b n(t -u- t sn )] g(u) duX 
~ 7 7a dt 



-oo 



{j^H[b n (t'-u)]g(u)d U } 2 

< C I >—dt'. 

J" 00 /-oo /0,r(*' + *s,n - du 

Moreover, there exists a positive constant C such that for any t € M and any s G {1, . . . , 6 n }, 
/o,r(i + t a ,n)>C/o, r (t). Then, 

/•+-{L + "^[°n(i / -«)]5(n)^} 2 , 
J-oo J_ 0O fo, r {t' - u)g(u) du 

r +oo H(v - y)g(y/b n ) dy/b n \ 

< Cb- 1 / i — — } —dv. (16) 

J-oo J0,r*g{v/b n ) 

The right side of (|16|) is typically the kind of x 2 divergence that is upper bounded in the 
proof of Theorem 4 in [Fan91bj for computing pointwise rates of convergence. However, a 
slight modification of the proof of Fan is necessary since we can not assume here that r < 
min(l, k>2 — 0.5) (because r > p + 3/2). It is shown in the proof of Theorem 4 in |Fan91bj that 

N 2 

H{v-y)g{y/b n )dy/b n \ dv = O (bf exp(-2o£/ 7 )) . (17) 

According to Lemma H] of Appendix \K\ there exist to > 0, C\ > and C2 > such that for any 
t G R: 

/o,r*s(*) > Cll|t|<to + ^l|t|>to ( 18 ) 
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Note that we can apply Lemma [5] of appendix IA1 since r satisfies (0). Then, using (fT7|) . (fTH|) 
and Lemma [5] of Appendix lAl for T > to we have: 



fo,r*g(v/b n ) 

JL^ 3 #(« - y)g{y/K) dy/b n j 



\v\/b n <T 



fo,r*g(v/b n ) 



dv 



y)g{y/b n )dy/b n } 



\v\/b n >T 



2r\-l 



< (Ci A C 2 T-^) 



oo V. </ — oo 



- y)g{y/b n ) dy/b n \ dv + C, 



fo,r*g(v/b n ) 

{\v\/b n )- 2 ^ 

\v\/b n >T {\v\/bn)- 2r 



dv 



dv 



< O (T 2r bf exp(-2&£/ 7 )) + O (bf-~^T 



n2(r-K 2 )+l 



2i — 2K 2 -2f3 



lb 

for T large enough. By taking T = T n = b n * K2 1 exp ( , 2k J in this bound and according 



to (|16p . we find that for n large enough: 
X 2 {he, S fi , ho iS> i) = O ^b 

= O (exp(—rj 



exp 
= O 



7(2*2-1) 



2&£ r 



2r 



2k 2 - 1 



for b n defined by ([7|). 



2.2.2 The general case 

We now assume, as in the introduction, that there exists a invertible matrix A such that the 
coordinates of the vector Ae\ are independent. Let G Ca(M, p) and let fi n be an estimator of 
the probability measure fi. Let [i A and fi A be the image measures of \i and /} by A. Then, 

Wi(fin , H A ) = min a / ||x - y||7r(dx,iiy) 
7ren(Aj*,At A ) jRd xRd 

= min / \\Ax — Ay\\ r(dx, dy) 

rGn(/i n ,At) Jm. d Y.K. d 

< \\A\\ Wi /x) , 

where ||A|| = sup|| x || =1 ||Aa;||. Consequently W\ (An, A*) > II^II^^MAn > M" 4 )- 

The image measure of [i * p £ hy A is equal to ^ * MeS where //^ is the image measure of 
/i e by A. Moreover, the probability measure estimator jl A can be written fi A = m (Zi, ■ ■ ■ , Zn) 
where Zi = Al^ and m is a measurable function from (M rf ) n into the set of probability measures 
onM d . Thus, 

E (M/% )®n Wi (fi A ,fi A ) = E (jU A^A)®n Wi (m(Zi, . . . , Z n ) , ai A ) . 
Since /i £ <t=? p A £ C(M,p), we obtain that 

sup E (wtc) ®n Wi^A 4 ) > Pll" 1 SU P E (/^*^)®»Wi (m(Zi,...,Z n ), //) . (19) 

IJ,eC A (M,p) /^eC(M,p) 

Note that, in the model Z{ = AXi + Asi, the error i] = As has independent coordinates and 
satisfies the assumptions of Theorem [2] for A = 1^. 
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We now apply the lower bound obtained in Section 12.2.11 , which gives that there exists a 
positive constant C such that 

liminf (logn) 1 ^ sup E ( )8 „ W x {m(Z u . . . , Z n ) , v) > C. (20) 

n ^°° ueC(M, P ) 

The result follows from (EH and (fT9l). 



3 Upper bounds 

In this section, we generalize the results of [CCDMlTj by proving an upper bound on the rates 
of convergence for the estimation of the probability fi under any metric W p . 

3.1 Errors with independent coordinates 

In this section, we assume that the random variables (£ij)i<j<d are independent, which means 
that £\ has the distribution [i e = /i £j i (g> • • • <g> fi £j d- 

Let p G [l,oo[ and denote by \p\ the smallest integer greater than p. We first define a 
kernel k whose Fourier transform is smooth enough and compactly supported over [—1,1]. Such 
kernels can be defined by considering powers of the sine function. More precisely, let 



((2\p/2\ +2) sin, x 
k{x) = c p { 



2|p/2|+2 



where c p is such that J k(x)dx = 1. The kernel k is a symmetric density, and k* is supported 
over [—1,1]. Moreover k* is \p~] times differentiable with Lipschitz |p]-th derivative. For any 
j G {1, • • • , d} and any hj > 0, let 

A preliminary estimator f n is given by 

^-stn.iM 2 ^)- (2I) 



i=l j=l...d 



The estimator (|2ip is the multivariate version of the standard deconvolution kernel density 
estimator which was first introduced in |CH88j . This estimator has been the subject of many 
works in the one dimensional case, but only few authors have studied the multidimensional 
deconvolution problem, see |Tan94j . [CLllj and [CCDMllj . 

The estimator f n is not necessarily a density, since it has no reason to be non negative. 
Since our estimator has to be a probability measure, we define 

g n (x) = a n f+(x), where a n = - — — j- and /+ = max{0, /„} . 

J K d Jn (x)dx 

The estimator £i n of [i is then the probability measure with density g n . 

The next theorem gives the rates of convergence of the estimator fi n under some assumptions 
on the derivatives of the functions rj := 1/ p* £ ■. 
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Theorem 4. Let M > 0. Assume that we observe a n-sample Y± . . . , Y n in the multivariate 
convolution model (CP. Also assume that there exists (3 > 0, (3 > 0, 72 > and C2 > such that 
for every j £ {1, . . . , d}, every I £ {0, 1 . . . , \p] + 1} and every t £ M; 

<c 2 (l + |i|^)exp(|i|^/7 2 ) . (22) 
Taking hi = ■ ■ ■ = = (4^/(72 log(n)) 1 /^, i/iene exists a positive constant C such that 

sup E^jan (W* (//, An)) < C (log n)"t . 
M6£>(M,p) 




3.1.1 Proof of Theorem H 

Let = (^i, /12, . . . , /id). We follow the proof of Proposition 2 in [CCDMiT] . First we have the 
bias-variance decomposition 

E {fl ^ n (W p (fi n , f i))<2 p - 1 B(H) + 2 2 ^ [ (2 p - 1 C(H) + \\x\\ p )Jv a i(f n (x))dx, 

where 

B(H) = j \\H t x\\ p K{x)dx and C(H) = B(H) + J \\x\\ p fi(dx) . 

The proof of this inequality is the same as that of Proposition 1 in [CCDMiTj . by using Theorem 
6.15 in |yn08] . 

Note that B(H) is such that B(H) < dP' 1 f3{h{ + ■ ■ ■ + h p d ), with p = f \u\ p k{u)du. To ensure 
the consistency of the estimator, the bias term B(H) has to tend to zero as n tends to infinity. 
Without loss of generality, we assume in the following that H is such that B(H) < 1. Hence, 
the variance term 

Vn = 2 2(p-i) f ( 2 P-ic(iJ) + \\x\\ p )Jv a v(f n (x))dx 

is such that 

V n <C ( 1 + y~] \xj\ p ) v/Var(/ n (xi, . . . ,x n )) dx x . . . dx d 
for some positive constant C. Now 




Applying Cauchy-Schwarz's inequality d-times, we obtain that 
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where D\ and D2 are positive constants depending on d. Now, by independence of X\ and £\, 
and by independence of the coordinates of E\ , 

d d d 

( v Y ti)) < e 4 n^ 1 + X h)) + E ^( £ li))- 

i=i i=i i=i 



Since /x G D(M,p), it follows that 



Var(/ n (2;)) (ix < — ^ 
/n 



II p + ^Pr^K)) 2 ^ 
j=l ^ J 



(23) 



In the same way, using again that [i G D(M,p), we obtain that 



|^| p ^/Var(/ n (xi, . . . , x n )) dxi... dx d 



< 



(1 + \u e \*P+*h? +2 )UhMYduz J] /(l + uffi^-ikj^iu^fduj. 



Starting from these computations, one can prove the following Proposition. 
Proposition 1. Let (h±, . . . , h^) G [0, 1]^. The following upper bound holds 

T Id d d 

\^ e) ^{w^ n ^))<{2dY~ i m+---+h p d )+— \ ni J (h J )+Y,Mhi)( n w. 

vn y=i e=i j=i,m 

where L is some positive constant L and 

Ij(h) < \ [^(r^r + iriAuWdu, 



l/h 



Jj{h) < J J 1/h (r,(u)) 2 + (rj W+1) (u)) 2 du 



+ £ fcW+1 ~V/-. 



l/h 



(rf\u)) 2 du. 



hd = h. 



Let us finish the proof of Theorem 2] before proving Proposition [TJ Take h\ = . . 
The condition (|22p on the derivatives of rj leads to the upper bounds 

Va-^WGw)) < c(h p + ^ d( ^ +1)/2 exp(d/( 72 ^))) . 
The choice /i = (4^/(72 log(ra)) 1 /^ gives the desired result. 

Proof of Proposition [TJ It follows the proof of Proposition 2 in [CCDMlT] . By Plancherel's 
identity, 



/ -ikj h (u)Ydu = — / - . I , , , „ du 
J h y ]M n 2itJ h{nlj 



1 f (k*(hu)f 



{u/h)f 



du 



2W 0<>)) 2 

~ 27rJ_ 1/h ^ ' 
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the last upper bound being true because k* is supported over [—1, 1] and bounded by 1. 

Let C be a positive constant, which may vary from line to line. Let qj,h(u) = rj(u/h)k* (u). 
Since qj^ is differentiable with compactly supported derivative, we have that 

-iu2nk jjh (u) = (q' j>h )*(u) . 

Applying Plancherel's identity again, 

.2/51 . f..\\2j.. _ 1 f utJ f..\\2. 



hu (kj jh (u)) du = —I h(q' jh (u)) du 



< C 



2vr „ 

l/h rl/h 

(r'j(u)) 2 du + h 2 J r](u)duj, 



l/h J -l/h 



the last inequality being true because k* and (k*)' are compactly supported over [—1,1]. Con- 
sequently 

+ ^)^(W^)) 2 ^' ^ CI i( h i) • 
tij 

In the same way 

(-iu)M+ 1 27rk j , h (u) = (q^ +1) y(u) 

and 

\m+\m+*^ h{ u)fdu=^ J h 2 ^ + \q^ +1 \u)) 2 du. 
Now, since k* , (k*)', . . . , are compactly supported over [—1,1], 



r M+i ,i/h 

/ h 2 W+\q$ +1) (u)) 2 du < C / i 2(rpl+1 " fc) / (rf\u)) 

J 7-_n J — l/h 



{k) ^ 2 du. 



k=0 



Consequently 



l + \u e \ 2 P +2 h 2p+2 )^(k e , ht (ui)) 2 du e 



< 2 J (I + \u e \ 2 ^ +2 hf p]+2 )^(k, he M) 2 due < CJ £ (h £ ) . 

The results follows. □ 
3.2 The general case 

Here, as in the introduction, we shall assume that there exists an invertible matrix A such that 
the coordinates of the vector Ae\ are independent. Applying A to the random variables Yi in 
(PQ), we obtain the new model 

AYi = AXi + Aei , 

that is: a convolution model in which each error vector r\i = Aei has independent coordinates. 
To estimate the image measure [i A of // by A, we use the preliminary estimator (|2ip . that is 



i=l j=l...d J 



and the estimator p, n> A of fi A is deduced from f n> A as in Section 13.11 This estimator fi n ,A has 
the density g n a with respect to the Lebesgue measure. 
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To estimate [i, we define p, n = fi^ a as * ne i ma g e measure of fL n ,A by A -1 . This estimator 
has the density g n = |^4|^ njJ 4 o A with respect to the Lebesgue measure. It can be deduced from 
the preliminary estimator f n = \A\f nt A ° A as in Section [3.11 Now 



Consequently, if \\A 1 || = supy^n^ \\A 1 x\\, we obtain that 

W${ji n ,n) < WA-^W^a,^) , (24) 

which is an equality if A is an unitary matrix. Note also that [i £ T>a{M, p) if and only if 
fi A £ V{M,p). 

Let be the distribution of the ryj's. Since the coordinates of the r^'s are independent, 
can be written as /x^ = ii rh \ fi^^- As in Section [37TI let rj := l/fj,*j. Assume 

that the r^-'s satisfy the condition (|22p . It follows from (|24p and Theorem [J] that, taking 
h\ = ■ ■ ■ = hd = (4d/(72 log(n)) 1 /' 3 , there exists a positive constant C such that 

p 

sup E(^ e )®n (Wg (//, /i n )) < C(logn)"^ . 
3.3 Examples of rates of convergence 

Gaussian noise. Assume that we observe Y\, . . . ,Y n in the multivariate convolution model 
([pj, where e is a centered non degenerate Gaussian random vector. In that case, there always 
exists an invertible matrix A such that the coordinates of Ae\ are independent. The distribution 
of (Ae\ )j is a either a Dirac mass at zero or a centered Gaussian random variable with positive 
variance. Since e is non degenerate, there exists at least one index jq for which (Aei)j is non 
zero. 

Now, the distribution of (Ae\)j satisfies the assumptions of Theorem [21 for any p > 1 
and (3 = 2 (Conditions ([3]) and @ follow from Lemma [T|). Moreover, denoting by fi v j the 
distribution of r/ij = (Aei)j, then the quantity r* = I/a^j satisfies ([22]) for any p > 1 and 
(3 = 2. Theorem Q] follows then from Theorems [2] and U] (more precisely, the estimator fi n of 
Theorem [1] is constructed as in Section f3.2j) . 

Other supersmooth distributions. For a G]0, 2[, we denote by s Q the symmetric a-stable 
density, whose Fourier transform q a is given by 

s* a (x) = q a (x) = exp (-\x\ a ) . 

Let q a ^\ = q a and q a ^ = q a * q a - For any positive integer k > 2, define by induction = 

Lemma 2. Let k be a positive integer. The function q a ^ satisfies the following properties 

(k—l) 

1. q a ,k is k — 1 times differentiate, and <jr . is absolutely continuous, with almost sure 

derivative q^\- Moreover, if a G]0, 1[ then q^ k ^ is bounded, and if a £ [1, 2[ then q^fl is 
bounded. 



min / \\x — y\\ p \(dx, dy) 
n(An,M) J 

A~ 1 (x — y) \\ p ir(dx, dy) 



A£il(/i rl ,/i) 

min 

ir&Tl(fi n A ,n A ) 
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2. There exists two positive constants a a ^ and b a ^ such that for any 

\x\ a 



a aifc exp (-\x\ a ) < q a ,k(x) < 6 Q ,fcexp 



2(fc-l)a 



The proof of Lemma [2] is given in Appendix [Cj Next, for any integer k > 2, we introduce 
the supersmooth density 

f k(x) = 

and we note that f* k = q atk /q a ,k(°) and fa,k( x ) = 0(\x\~ k ( a+r >). Let r aM = l//* )fc . According 
to Lemma [21 it follows that r is k times differentiable with 

l-SWISCgg^ for 
Applying Lemma[2] we see that: if a e]0, 1[, then for any £ £ {0, . . . , k — 1} 

|r^ fe (x)| < A^exp(|zf), (25) 
and the same holds for any i G {0, . . . , k} if a € [1, 2[. Moreover, we also have the lower bound 



\ r a,k(x)\ > C Qi fcexp ( (fc-l)o 

(26) 

Now, assume that we observe Yi,...,Y n in the multivariate convolution model ([T]). Let 
p > 1, assume that there exists an invertible matrix j4 such that, for any j £ {1, . . . , c?}, (j4ei)j 
has the distribution for some ay g]0, 2[ and such that kj > \p~\ + 1 + 21 aj . e ] 0jl [. Let 

a = maxi<j<d ay. 

Inequality (|26p gives Condition ([2]) in Theorem [4] for /3 = a. Lemma Q] can be applied with 
a = ( \p\ + 1 + 21 a e]o,i[)( a + 1) an d then Conditions ([3]) and (JH) of Theorem [2] are also satisfied. 
Next, according to (|25p . Condition ()22p in Theorem U] is satisfied for (3 = a. Theorems [2] and d] 
finally give the following result: 

1. There exists a constant C > such that for all estimator Jl n of the measure \i: 

liminf (lognf/* sup E ( ^ e) ®„(W^(/%, //)) > C. 

2. The estimator £t n of /U constructed in Section [3.21 is such that 

sup sup (logn) p/a E (/WMe )8„(VPP(/t n ,/i)) < A', 
n>l AteX> A (M,p) 

for some positive constant A. 



Mixtures of distributions. Of course, the independent coordinates of Ae\ need not all be 
Gaussian or even supersmooth. 

For instance if there exists jo such that (Ae\)j is a non degenerate Gaussian random variable, 
and the other coordinates have distribution which is either a Dirac mass at or a Laplace 
distribution, or a supersmooth distribution f a ^ f° r some a e]0, 2[ and k > \p~\ + 1 + 21 Q , g ] ^ 
(this list in non exhaustive), then the estimator jl n of fj, constructed in Section T3.2I is such that 

sup sup (log n) p/2 E( w )®t.(Wj (//„,, n)) < K, 

n>l /iSX> A (M,p) 
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and this rate is minimax. 

In the same way if there exists jo such that (Aei)j is supersmooth with density f a ± for 
some a g]0, 2[ and k > \p\ + 1 + 21 Q , e ] 01 r, and the other coordinates have distribution which is 
either a Dirac mass at or a Laplace distribution, or a supersmooth distribution fp, m for some 
j3 G]0, a] and m > \p] + 1 + 21 /36 ] 0j i[, then the estimator p, n of \x constructed in Section [3721 is 
such that 

sup sup (log n) p/a E (M/ie) ®„ (H^(An, < K , 

n>l fj,£T> A {M,p) 

and this rate is minimax. 

4 Discussion 

In the supersmooth case, we have seen that lower bounds for the Wasserstein deconvolution 
problem in any dimension can be deduced from lower bounds for the deconvolution of the c.d.f 
in dimension one. But this method cannot work in the ordinary smooth case for d > 1, because, 
contrary to the supersmooth case, the rates of convergence depends on the dimension. 

Let us briefly discuss the case where d = 1 and the error distribution is ordinary smooth. 
It is actually well known that establishing optimal rates of convergence in the ordinary smooth 
case is more difficult than in the supersmooth case, even for pointwise estimation, as noticed by 
Fan in |Fan91bj . When the density is m times differentiable, Fan gives in this paper pointwise 
lower and upper bounds for the estimation of the c.d.f. in both the supersmooth case and the 
ordinary smooth case. He finds the optimal rates in the supersmooth case and he conjectures 
that his upper bound is actually optimal in the ordinary smooth case (see his Remark 3). 
Optimal pointwise rates for the deconvolution of the c.d.f. in the ordinary smooth case was 
an open question until recently. This problem has been solved in [DGJllj when the density 
belongs to a Sobolev class. 

When d = 1 and the error distribution is ordinary smooth, some results about integrated 
rates of convergence for the density (and its derivatives) can be found in |Fan9 3 1 lFan9 1 a j but the 
case of the c.d.f. (for the integrated risk) is not studied in these papers. However, some lower 
bounds can be easily computed by following the method of |Fan93j and using the pointwise 
rates of |Fan91bj : for a class of ordinary smooth noise densities of order (5 and assuming only 
that the unknown distribution fi has a moment of order 4, we find that the minimax integrated 
risk is lower bounded by n _1// ^ 2 ^ +1 - > and we then obtain the same lower bound for W\. As for 
the pointwise estimation described in |Fan91b] . these rates do not match with the upper bounds 
given by Proposition [JJ for W\. For instance, for Laplace errors {(3 = 2), the rate of convergence 
of the kernel estimator under W\ is upper bounded by n -1 / 7 . We are currently working on this 
issue, and we conjecture that the minimax rates of convergence for W\ when d = 1 is of order 
n - 1 /( 2 /3+2) f or a c i ass G f ordinary smooth errors distributions of order /3. If this conjecture is 
correct, it means that the existing lower and upper bounds have to be improved. 

A Some known lemmas 

The following lemma is given in [FT93j (Lemma 1): 
Lemma 3. Let H be a function such that 

\H(t)\ <C(l + t 2 )- r 
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for some C > and some r > 0.5. Then there exists a positive constant C such that for any 
sequence b n — > oo, 



Y j \H{b n (t-s/n))\ <C(l + t 2 )- r . 



Let /o, r be the function defined in ([5|) . The following lemma can be found in [FanQlbj (L emma 5.1) 
Lemma 4. For any probability measure fx, there exists a constant C r > such that 

/o,r — C r t~ 2r as \t\ tends to infinity. 

The following lemma is rewritten from |Fan91bj (Lemma 5.2): 

Lemma 5. Let r > 0. Suppose that P(|e^ — 1\ < \t\ Kl ) = 0(\t\~ K2 ) as \t\ tends to infinity for 
some < ki < 1 and K2 > 1. Let H be a bounded function such that \H(t)\ < 0{\t\~ 2r ) for 
some r > K2/(2ki). Then there exists a large T and a constant C such that when \v\/b n > T : 



H(v - y)g(y/b n ) dy/b n < C{\v\/b n y K \ 
B Distances between probability measures 

The first lemma follows straightforwardly from the definition of W\ . 

Lemma 6. Let fi and fi be two measures on M. d with finite first moments, and let and fi\ be 
their first marginals. Then Wi(n,jl) > W\ 

The following Lemma is a particular case of the famous Le Cam's inequalities. See for instance 
Section 2.4 in |Tsy 09j for more details. 

Lemma 7. Let h and h be two densities on W 1 , then 



J 



min fh(x), h(x) \ dx > — I f \l h(x)h(x)dx 

^ / 2 [J K n 



The next lemma can be found for instance in Section 2.4 of |Tsy09| . 
Lemma 8. Let h and h be two densities for the Lebesgue measure on R, then 

1 



/ 



h(y)h(y)dy>l-^ X 2 (h,h). 



C Auxiliary results 



Proof of Lemma [2] The proof of Item 1 is standard. Note first that q a is bounded and 
absolutely continuous with almost sure derivative q' a , which is bounded as soon as a E [1,2[. 
This proves the result for k — 1 . It follows that q a ~k q a is differentiable with derivative q a ~k q f a . 
This derivative is absolutely continuous with almost sure derivative q' a *q' a . Moreover q a *q' a is 
bounded (because q' a is integrable and q a is bounded), and if a £ [1, 2[ then q' a *q' a is bounded 
(because in that case q' a is bounded). This proves Item 1 for k = 2. The general case follows 
by induction. 
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In the same way, it suffices to prove Item 2 for k = 2, and the general case follows by 
induction. Since q a * q a is symmetric, it suffices to prove the result for x > 0. Now, for any 
x > 0, 



f+OO 

qa*qa(x) = 2 / exp ( — \x — t\ a — t a ) dt 

Jx/2 



r/2 

< a Q , 2 exp(-(x/2) Q ) 



On the other hand, for any x > 1, there exist a positive constant c a such that 

fX 

qa*qa(x) > / exp (— \x — t\ a — t a ) dt 
Jx/2 

r-x/2 

> exp (— x a ) / exp (— u a ) du 

Jo 

> c a exp(-x a ). (27) 

The function x t— > q a * q a (x) exp (x a ) is continuous and positive on [0,1] and thus (j27j) is 
also true on [0, 1] for some other positive constant c' a . The lower bound follows by taking 
6 Qi 2 = min{ c a ,c' a }. 
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